Adding agg_for_doc_count even if distinct count is present in query#29
Merged
expani merged 1 commit intovinaykpud:feature/substrait-planfrom Dec 23, 2025
Merged
Conversation
Signed-off-by: expani <anijainc@amazon.com>
a0b649f
into
vinaykpud:feature/substrait-plan
5 of 33 checks passed
Closed
3 tasks
abhita
added a commit
to abhita/sql
that referenced
this pull request
Jan 5, 2026
Merge pull request vinaykpud#29 from expani/feature/substrait-plan Adding agg_for_doc_count even if distinct count is present in query Adding agg_for_doc_count even if distinct count is present in query Signed-off-by: expani <anijainc@amazon.com> Merge pull request vinaykpud#28 from expani/df_lucene_path_sql Skip generating and setting the Substrait plan based on Index Setting Updated to test to use new index setting Signed-off-by: expani <anijainc@amazon.com> Using Optimized index setting to not generate substrait for existing path Signed-off-by: expani <anijainc@amazon.com> Resolved Sort Pushdown and updated integration tests queries and results (vinaykpud#27) * Added sort in pushdown context Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Removed unwanted sort pushdown in pushDownSortAggregateMeasure Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Accessed the fullReNodeTree in CalciteEnumerableIndexScan removed the redundancy in OpenSearchRequest Also i did this since earlier way of accessing RelNode from ThreadLocal from OpenSearchRequest class is not working as it looks like the thread which sets the variable and accesses it now are different Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Updated some integ test queries and results Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> --------- Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> Added substrait type conversion support for IP field (vinaykpud#26) Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> Use Full RelNode tree for execution Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> Fixed the sort condition for queries with multiple aggregation (vinaykpud#24) without this change if we have a PPL query with multiple aggregations and sort condition, it will sort only based on the first aggregated field. Also found similar changes done in latest mainline: https://tinyurl.com/u32kpyk9 Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> Added sort after aggregarte relNode for pushdown context (vinaykpud#23) Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> Added logs for reconstructPushedDownRelNodeTree (vinaykpud#22) Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> Added min function for string field and fixed query 29 in tests (vinaykpud#20) * Added min function for string field Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Fixed the query 29 PPL Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> --------- Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> All changes related to convert only pushed down operations as RelNodeTree and Substrait (vinaykpud#19) * Changes to convert only pushed down operations RelNode and Substrait Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Adding the missing Project operator after the filter Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Small refactors in PushDownContext and logs Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Added full relNode log Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Removed unwanted comments Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Use Direct RelNode.copy() Instead of RelBuilder Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Disabled CoreRules.FILTER_REDUCE_EXPRESSIONS this is done to disable the calcite optimization on timerange queries ie calcite optimizes `>=($0, TIMESTAMP('2013-07-01 00:00:00')) AND <=($0, TIMESTAMP('2013-07-31 00:00:00'))` to `SEARCH($0, Sarg[['2013-07-01 00:00:00'..'2013-07-31 00:00:00']])` Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> --------- Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
we are missing this check in SQL plugin causing
agg_for_doc_countto be not added as an aggregate when Distinct Cardinality Aggregate is part of the query.Adding this the plan's look like follows
doc_countis expected for all scenarios.This is required for merging at Coordinator to work properly. See https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/InternalTerms.java#L395
There are multiple places that rely on doc count for instance https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/InternalTerms.java#L474 relies on the same to prune buckets lower that a certain doc count.
More context in opensearch-project/OpenSearch#20263